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Abstract 

We study the problem of estimating a manifold from random samples. 
In particular, we consider piecewise constant and piecewise linear estima- 
tors induced by k-means and k-flats, and analyze their performance. We 
extend previous results for k-means in two separate directions. First, we 
provide new results for k-means reconstruction on manifolds and, secondly, 
we prove reconstruction bounds for higher-order approximation (k-flats), 
for which no known results were previously available. While the results for 
k-means are novel, some of the technical tools are well-established in the 
literature. In the case of k-fiats, both the results and the mathematical 
tools are new. 

1 Introduction 

Our study is broadly motivated by questions in high-dimensional learning. As 
is well known, learning in high dimensions is feasible only if the data distri- 
bution satisfies suitable prior assumptions. One such assumption is that the 
data distribution lies on, or is close to, a low-dimensional set embedded in a 
high dimensional space, for instance a low dimensional manifold. This latter 
assumption has proved to be useful in practice, as well as amenable to theo- 
retical analysis, and it has led to a significant amount of recent work. Starting 
from [531[3U|S], this set of ideas, broadly referred to as manifold learning, has 
been applied to a variety of problems from supervised [35] and semi-supervised 
learning [6], to clustering [37] and dimensionality reduction [5], to name a few. 

Interestingly, the problem of learning the manifold itself has received less 
attention: given samples from a d-manifold M embedded in some ambient 
space X, the problem is to learn a set that approximates Ai in a suitable 
sense. This problem has been considered in computational geometry, but in a 
setting in which typically the manifold is a hyper-surface in a low-dimensional 
space (e.g. M 3 ), and the data are typically not sampled probabilistically, see for 
instance [53] . The problem of learning a manifold is also related to that of 



estimating the support of a distribution, (see [T3J [2] for recent surveys.) In this 
context, some of the distances considered to measure approximation quality are 
the Hausforff distance, and the so-called excess mass distance. 

The reconstruction framework that we consider is related to the work of [TJ 
32 , as well as to the framework proposed in [3D], in which a manifold is ap- 
proximated by a set, with performance measured by an expected distance to 
this set. This setting is similar to the problem of dictionary learning (see for 
instance |29j , and extensive references therein) , in which a dictionary is found by 
minimizing a similar reconstruction error, perhaps with additional constraints 
on an associated encoding of the data. Crucially, while the dictionary is learned 
on the empirical data, the quantity of interest is the expected reconstruction 
error, which is the focus of this work. 

We analyze this problem by focusing on two important, and widely- used 
algorithms, namely k-means and k-flats. The k-means algorithm can be seen to 
define a piecewise constant approximation of Ai. Indeed, it induces a Voronoi 
decomposition on M, in which each Voronoi region is effectively approximated 
by a fixed mean. Given this, a natural extension is to consider higher order 
approximations, such as those induced by discrete collections of k (i-dimensional 
affine spaces (k-flats), with possibly better resulting performance. Since A4 is 
a d-manifold, the k-flats approximation naturally resembles the way in which a 
manifold is locally approximated by its tangent bundle. 

Our analysis extends previous results for k-means to the case in which the 
data-generating distribution is supported on a manifold, and provides analo- 
gous results for k-flats. We note that the k-means algorithm has been widely 
studied, and thus much of our analysis in this case involves the combination of 
known facts to obtain novel results. The analysis of k-flats, however, requires 
developing substantially new mathematical tools. 

The rest of the paper is organized as follows. In section [2] we describe the 
formal setting and the algorithms that we study. We begin our analysis by 
discussing the reconstruction properties of k-means in section [3j In section [1J 
we present and discuss our main results, whose proofs are postponed to the 
appendices. 

2 Learning Manifolds 

Let X by a Hilbert space with inner product (•, •), endowed with a Borel proba- 
bility measure p supported over a compact, smooth d- manifold M. We assume 
the data to be given by a training set, in the form of samples X n — (x\, . . . , x n ) 
drawn identically and independently with respect to p. 

Our goal is to learn a set S n that approximates well the manifold. The approx- 
imation (learning error) is measured by the expected reconstruction error 
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where the distance to a set S C X is d? x (x, S) — irnv e s d? x (x, x'), with d x (x, x') = 
|| a; — ar'||. This is the same reconstruction measure that has been the recent focus 



It is easy to see that any set such that S D Ai will have zero risk, with M. 
being the "smallest" such set (with respect to set containment.) In other words, 
the above error measure does not introduce an explicit penalty on the "size" of 
S n : enlarging any given S n can never increase the learning error. 
With this observation in mind, we study specific learning algorithms that, given 
the data, produce a set belonging to some restricted hypothesis space W (e.g. 
sets of size k for k-means), which effectively introduces a constraint on the size 
of the sets. Finally, note that the risk of Equation [T] is non-negative and, if the 
hypothesis space is sufficiently rich, the risk of an unsupervised algorithm may 
converge to zero under suitable conditions. 

2.1 Using K-Means and K-Flats for Piecewise Manifold 
Approximation 

In this work, we focus on two specific algorithms, namely k-means [28H2Z] and 
k-flats 9 a . Although typically discussed in the Euclidean space case, their defi- 
nition can be easily extended to a Hilbert space setting. The study of manifolds 
embedded in a Hilbert space is of special interest when considering non-linear 
(kernel) versions of the algorithms [TS] . More generally, this setting can be seen 
as a limit case when dealing with high dimensional data. Naturally, the more 
classical setting of an absolutely continuous distribution over d-dimcnsional Eu- 
clidean space is simply a particular case, in which X = R d , and M is a domain 
with positive Lebesgue measure. 

K-Means. Let H = Sk be the class of sets of size k in X. Given a training set 
X n and a choice of k, k-means is defined by the minimization over S G Sk of 
the empirical reconstruction error 



where, for any fixed set S, £ n (S) is an unbiased empirical estimate of £ p {S), so 
that k-means can be seen to be performing a kind of empirical risk minimization 



A minimizer of Equation [2] on Sk is a discrete set of k means SVi.fc — 
{mi, . . . , mfc}, which induces a Dirichlct-Voronoi tiling of X: a collection of 
k regions, each closest to a common mean E] (in our notation, the subscript n 
denotes the dependence of S n ,k on the sample, while k refers to its size.) By 
virtue of S n ^k being a minimizing set, each mean must occupy the center of mass 
of the samples in its Voronoi region. These two facts imply that it is possible 
to compute a local minimum of the empirical risk by using a greedy coordinate- 
descent relaxation, namely Lloyd's algorithm |27) . Furthermore, given a finite 
sample X n , the number of locally-minimizing sets S n .k is also finite since (by 
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the center-of-mass condition) there cannot be more than the number of pos- 
sible partitions of X n into k groups, and therefore the global minimum must 
be attainable. Even though Lloyd's algorithm provides no guarantees of close- 
ness to the global minimizer, in practice it is possible to use a randomized 
approximation algorithm, such as kmeansH — h [2]- see appendix, which provides 
guarantees of approximation to the global minimum in expectation with respect 
to the randomization. 

K-Flats. Let H — Tk be the class of collections of k flats (affine spaces) of 
dimension d. For any value of k, k-nats, analogously to k-means, aims at finding 
the set Fk G J~k that minimizes the empirical reconstruction ([2| over J-y.. By 
an argument similar to the one used for k-means, a global minimizer must be 
attainable, and a Lloyd-type relaxation converges to a local minimum. Note 
that, in this case, given a Voronoi partition of M. into regions closest to each 
(i-flat, new optimizing flats for that partition can be computed by a <i-truncated 
PCA solution on the samples falling in each region. 

2.2 Learning a Manifold with K-means and K-fiats 

In practice, k-means is often interpreted to be a clustering algorithm, with clus- 
ters defined by the Voronoi diagram of the set of means S n ^. In this interpreta- 
tion, Equation [2] is simply rewritten by summing over the Voronoi regions, and 
adding all pairwise distances between samples in the region (the intra-cluster 
distances.) For instance, this point of view is considered in where k-means is 
studied from an information theoretic persepective. K-means can also be inter- 
preted to be performing vector quantization, where the goal is to minimize the 
encoding error associated to a nearest-neighbor quantizer |17j . Interestingly, in 
the limit of increasing sample size, this problem coincides, in a precise sense [33], 
with the problem of optimal quantization of probability distributions (see for 
instance the excellent monograph of [T8].) 

When the data-generating distribution is supported on a manifold Ai 1 k- 
means can be seen to be approximating points on the manifold by a discrete 
set of means. Analogously to the Euclidean setting, this induces a Voronoi 
decomposition of Ai , in which each Voronoi region is effectively approximated by 
a fixed mean (in this sense k-means produces a piecewise constant approximation 
of AL) As in the Euclidean setting, the limit of this problem with increasing 
sample size is precisely the problem of optimal quantization of distributions on 
manifolds, which is the subject of significant recent work in the field of optimal 
quantization [20, 21J. 

In this paper, we take the above view of k-means as defining a (piecewise 
constant) approximation of the manifold M supporting the data distribution. 
In particular, we are interested in the behavior of the expected reconstruction 
error £ p (S n _k), for varying k and n. This perspective has an interesting relation 
with dictionary learning, in which one is interested in finding a dictionary, and 
an associated representation, that allows to approximately reconstruct a finite 
set of data-points/signals. In this interpretation, the set of means can be seen 
as a dictionary of size k that produces a maximally sparse representation (the 
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k-means encoding) , see for example and references therein. Crucially, while 
the dictionary is learned on the available empirical data, the quantity of interest 
is the expected reconstruction error, and the question of characterizing the 
performance with respect to this latter quantity naturally arises. 

Since k-means produces a piecewise constant approximation of the data, a 
natural idea is to consider higher orders of approximation, such as approxi- 
mation by discrete collections of k d-dimensional affine spaces (k- flats), with 
possibly better performance. Since Ai is a d-manifold, the approximation in- 
duced by k-flats may more naturally resemble the way in which a manifold is 
locally approximated by its tangent bundle. We provide in Sec. |4.2| a partial 
answer to this question. 

3 Reconstruction Properties of k-Means 

Since we are interested in the behavior of the expected reconstruction ([!]) of 
k-means and k-flats for varying k and n, before analyzing this behavior, we 
consider what is currently known about this problem, based on previous work. 
While k-flats is a relatively new algorithm whose behavior is not yet well un- 
derstood, several properties of k-means are currently known. 

Recall that k-means find an discrete set S n< f. of size k that best approximates 
the samples in the sense of ^ . Clearly, as k increases, the empirical reconstruc- 
tion error £ n (S n ,k) cannot increase, and typically decreases. However, we are 
ultimately interested in the expected reconstruction error, and therefore would 
like to understand the behavior of £ p (S n< k) with varying k,n. 

In the context of optimal quantization, the behavior of the expected recon- 
struction error £ p has been considered for an approximating set Sk obtained 
by minimizing the expected reconstruction error itself over the hypothesis space 
H = Sk- The set Sk can thus be interpreted as the output of a population, 
or infinite sample version of k-means. In this case, it is possible to show that 
Sp{Sk) is a non increasing function of k and, in fact, to derive explicit rates. For 
example in the case X = M. d , and under fairly general technical assumptions, it 
is possible to show that £ p (Sk) = 0(fc _2 / rf ), where the constants depend on p 
and d [18] , 

In machine learning, the properties of k-means have been studied, for fixed 
k, by considering the excess reconstruction error £ p (S n ^) — £ p {Sk)- In partic- 
ular, this quantity has been studied for X = R d , and shown to be, with high 
probability, of order yjkd/n, up-to logarithmic factors (31] . The case where X 
is a Hilbert space has been considered in [301 E] , where an upper-bound of order 
k/y/n is proven to hold with high probability. The more general setting where 
X is a metric space has been studied in [7]. 

When analyzing the behavior of £ p {S nt k), and in the particular case that 
X =R d , the above results can be combined to obtain, with high probability, a 
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Figure 1: We consider the behavior of k- means for data sets obtained by sam- 
pling uniformly a 19 dimensional sphere embedded in K 20 (left). For each value 
of k, k-means (with k-mcans++ seeding) is run 20 times, and the best solution 
kept. The reconstruction performance on a (large) hold-out set is reported as 
a function of k. The results for four different training set cardinalities are re- 
ported: for small number of points, the reconstruction error decreases sharply 
for small k and then increases, while it is simply decreasing for larger data sets. 
A similar experiment, yielding similar results, is performed on subsets of the 
MNIST (http://yann.lecun.com/exdb/mnist) database (right). In this case 
the data might be thought to be concentrated around a low dimensional mani- 
fold. For example |22j report an average intrinsic dimension d for each digit to 
be between 10 and 13. 



bound of the form 



£ P (S n ,k) = £ P {S n , k ) - £ p (S k ) + £ p (S k ) < C [ J— + k- 2 ' d \ (3) 




up to logarithmic factors, where the constant C does not depend on k or n. 
The above inequality suggests a somewhat surprising effect: the expected re- 
construction properties of k-means may be described by a trade- off between a 

statistical error (of order \J~^) and a geometric approximation error (of order 

k- 2 ' d .) 

The existence of such a tradeoff between the approximation, and the sta- 
tistical errors may itself not be entirely obvious, see the discussion in [J]. For 
instance, in the k-means problem, it is intuitive that, as more means are inserted, 
the expected distance from a random sample to the means should decrease, and 
one might expect a similar behavior for the expected reconstruction error. This 
observation naturally begs the question of whether and when this trade-off re- 
ally exists or if it is simply a result of the looseness in the bounds. In particular, 
one could ask how tight the bound Q is. 

The bound on £ p (Sk) is known to be tight for k sufficiently big [18], where 
a lower bound on £ p {S n ,k) — £p(Sk) (in expectation) is proved in [4] to be 
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(a) £ p (S h=1 ) ~ 1.5 



(b) S p (S k=2 ) ~ 2 



Figure 2: The optimal k-means (red) computed from n = 2 samples drawn uniformly 
on S 100 (blue.) For a) k = 1, the expected squared-distance to a random point a; £ S 100 
is £ P (Sk=i) — 1.5, while for b) fc = 2, it is £ P (Sk=2) — 2. 



of order dy . This latter result would essentially predict an increasing 

behavior of (J^, for any d > 4. Whenever a trade-off holds, it may be used 
to justify a heuristic for choosing k empirically as the value that minimizes 
the reconstruction error in a hold-out set. A nice discussion on this point is 
given in [4 , where it is pointed out that "the exact dependence of the minimax 
distortion redundancy on k and d is still a challenging open problem." Indeed, 
it would be interesting to derive a lower bound for £ p {S n> k) itself. 

In Figure [I] we perform some simple numerical simulations showing that the 
trade-off indeed occurs in certain regimes. The following example provides a 
situation where a trade-off can be easily shown to occur. 

Example 1. Consider a setup in which n = 2 samples are drawn from a uniform 
distribution on the unit d — 100-sphere, though the argument holds for other n 
much smaller than d. Because d 3> n, with high probability, the samples are 
nearly orthogonal: < X±,X2 >x— 0, while a third sample x drawn uniformly on 
gioo w m a i SQ ver y Hk e ly b e nearly orthogonal to both x±,X2 f%5Jj . The k-means 
solution on this dataset is clearly S^—i = {{x\ +X2)/2} (Fig \2(a) ). Indeed, 



since Sk=2 = {^1,^2} (Fig \2(b)\ ), it is £ p {Sk=i) ~ 1.5 < 2 ~ £ p {Sk=2) with 
very high probability. In this case, it is better to place a single mean closer to 
the origin (with £ p ({0}) = 1), than to place two means at the sample locations. 
This example is sufficiently simple that the exact k-means solution is known, but 
the effect can be observed in more complex settings. 



4 Main Results 

Contributions. Our work extends previous results in two different directions: 

(a) We provide an analysis of k-means for the case in which the data-generating 
distribution is supported on a manifold embedded in a Hilbert space. In 
particular, in this setting: 1) we derive new results on the approximation 
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error, and 2) new sample complexity results (learning rates) arising from 
the choice of k by optimizing the resulting bound. We analyze the case 
in which a solution is obtained from an approximation algorithm, such as 
k-means++ [2], to include this computational error in the bounds. 

(b) We generalize the above results from k-means to k-flats, deriving learning 
rates obtained from new bounds on both the statistical and the approxi- 
mation errors. To the best of our knowledge, these results provide the first 
theoretical analysis of k-flats in either sense. 

We note that the k-means algorithm has been widely studied in the past, 
and much of our analysis in this case involves the combination of known facts to 
obtain novel results. However, in the case of k-flats, there is currently no known 
analysis, and we provide novel results as well as new performance bounds for 
each of the components in the bounds. 

Throughout this section we make the following technical assumption: 

Assumption 1. Ai is a d-manifold contained in the unit ball in X , with volume 
measure denoted by /i/. The probability measure p is absolutely continuous with 
respect to \ii with density p. 

4.1 Learning Rates for k-Means 

We begin discussing results for k-means. In this case we make the following 
additional assumption: 

Assumption 2. Assume the manifold M. to have a metric of class C . 

The first result considers the idealized case where we have access to an exact 
solution for k-means. 



Theorem 1. Under Assumptions 1\2 , if S n ^ is a solution of k-means then, for 



< 5 < 1, there are constants C and 7 dependent only on d, and sufficiently 
large n' such that, by setting 

kn = nm • ( ^= J ■ I j dpj(x)p(x) d /^ j , (4) 



and S n = S n k , it is 



S p (Sn) < 1 ■ n- 1 '^ ■ v^nTTtf • ( f din(x)p(x) d /( d +y 
for all n > n', where C ^ d/(2"Ke) and 7 grows sublinearly with d. 



> 1 - S, (5) 



Remark 1. Note that the distinction between distributions with density in M., 
and singular distributions is important. The bound of Equation ([5| holds only 
when the absolutely continuous part of p over M. is non-vanishing, the case in 
which the distribution is singular over M requires a different analysis, and may 
result in faster convergence rates. 
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The following result considers the case where the k-means++ algorithm is 
used to compute the estimator. 

Theorem 2. Under Assumptions 1\2 , if S n> k is the solution of k-means++ , 
then for < 5 < 1, there are constants C and 7 that depend only on d, and a 
sufficiently large n' such that, by setting 



and S n — S n ^. n , it is 



C 



24^ 



d/(d+2) 



d^(x)p{x) d '^ , 



M 



(6) 



E z S p (S n ) < 1 -n- 1 '^ (\nn + \n\\p\\ d/(d+2) ) • ^lf5 ■ { / d^ I {x)p{x) d ^ d 

(7) 

for all n > n' , where the expectation is with respect to the random choice Z in 

(d+2)/d 



the algorithm, and \\p\\d/(d+2) = 
and 7 grows sublinearly with d. 



d/x/(x)p(x) d/(d+2) 



M 



, C - d/(2ire), 



Remark 2. In the particular case that X = K d and M is contained in the unit 
ball, we may further bound the distribution- dependent part of Equations^and^ 
Using Holder's inequality, one obtains 



dis(x)p{x) d/(d+2) < 



dv{x)p{x) 



M 



d/{d+2) 



dv(x) 



M 



2/{d+2) 



(8) 



< Vol(M) 2 /^ <uJ (d+2 \ 

where v is the Lebesgue measure inM. d , anduJd is the volume of the d-dimensional 
unit ball. 

It is clear from the proof of Theorem^ that, in this case, we may choose 



n 2(d + 2) 



c 



d/(d+2) 



2/d 



independently of the density p, to obtain a bound = O nl/5) 

with probability 1 — S ( and similarly for Theorem^ except for an additional In n 
term), where the constant only depends on the dimension. 

Remark 3. Note that according to the above theorems, choosing k requires 
knowledge of properties of the distribution p underlying the data, such as the 
intrinsic dimension of the support. In fact, following the ideas in J36f Section 
6.3-5, it is easy to prove that choosing k to minimize the reconstruction error 
on a hold-out set, allows to achieve the same learning rates (up to a logarithmic 
factor), adaptively in the sense that knowledge of properties of p are not needed. 
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4.2 Learning Rates for k-Flats 

To study k-flats, we need to slightly strengthen Assumption [2] by replacing it 
by the following: 

Assumption 3. Assume the manifold M. to have a metric of class C 3 . 

One reason for the higher-smoothness assumption is that k-flats uses higher 
order approximation, whose analysis requires a higher order of differentiability. 
We begin by providing a result for k-flats on hypersurfaces (codimension one), 
and next extend it to manifolds in more general spaces. 



Theorem 3. Let, X = R d+1 . Under Assumptions Jp if F n ,k is a solution of 



k-flats, then there is a constant C that depends only on d, and sufficiently large 
n' such that, by setting 

and F n — F n ,k„) then for all n>n! it is 

£ p (F n ) < 2 (8ndf {d+4) C d '^ ■ n- 2 /< d + 4 ) . J^lnl/6 ■ (k m ) 4/(d+4) > 1-5, 

(10) 

where k m :— jj,. n , (A4) = / d^i 1 (x)\KQ 2 (x)\ is the total root curvature of M, 
/Z|n| is the measure associated with the (positive) second fundamental form, and 
k g is the Gaussian curvature on M. . 

In the more general case of a d-manifold M. (with metric in C 3 ) embedded in a 
separable Hilbert space X , we cannot make any assumption on the codimension 
of A4 (the dimension of the orthogonal complement to the tangent space at 
each point.) In particular, the second fundamental form II, which is an extrinsic 
quantity describing how the tangent spaces bend locally is, at every x € M, a 
map IIj; : T X A4 H> (T X A4)' L (in this case of class C 1 by Assumption 3) from the 
tangent space to its orthogonal complement (TL(x) :— B(x,x) in the notation 
of [TBI p. 128].) Crucially, in this case, we may no longer assume the dimension 
of the orthogonal complement (T X A4) ± to be finite. 

Denote by III^I = suprgj^x |jlla;(r)|| , the operator norm of II X . We have: 

||r||<l 



Theorem 4. Under Assumptions!^, if F n ^ is a solution to the k-flats problem, 



then there is a constant C that depends only on d, and sufficiently large n 1 such 
that, by setting 

K=n^.(-^=) (ID 
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and F n — F n fc n) then for all n>n' it is 



£ P {F n ) < 2 (8^d) 2/(d+4) C d '^ ■ n- 2 '^ ■ \J\\xil/8 ■ < /(d+4) 



where k m := J d^^x) \II X \< 



> 1-6, 
(12) 



Note that the better k-flats bounds stem from the higher approximation 
power of d-flats over points. Although this greatly complicates the setup and 
proofs, as well as the analysis of the constants, the resulting bounds are of order 
O (rt~ 2 /( d+4 )), compared with the slower order O (ri^ 1 /^ 2 )) of k-means. 

4.3 Discussion 

In all the results, the final performance does not depend on the dimension- 
ality of the embedding space (which in fact can be infinite), but only on the 
intrinsic dimension of the space on which the data-generating distribution is 
defined. The key to these results is an approximation construction in which 
the Voronoi regions on the manifold (points closest to a given mean or fiat) are 
guaranteed to have vanishing diameter in the limit of k going to infinity. Under 
our construction (see for instance the proof in Appendix B), a hypersurface is 
approximated efficiently by tracking the variation of its tangent spaces by using 
the second fundamental form. Where this form vanishes, the Voronoi regions of 
an approximation will not be ensured to have vanishing diameter with k going 
to infinity, unless certain care is taken in the analysis. 

An important point of interest is that the approximations are controlled 
by averaged quantities, such as the total root curvature (k-flats for surfaces 
of codimension one), total curvature (k-flats in arbitrary codimensions) , and 
d/(d + 2)-norm of the probability density (k-means), which are integrated over 
the domain where the distribution is defined. Note that these types of quanti- 
ties have been linked to provably tight approximations in certain cases, such as 
for convex manifolds [TH1 H2] , m contrast with worst-case methods that place 
a constraint on a maximum curvature, or minimum injectivity radius (for in- 
stance [TJ 152"].) Intuitively, it is easy to see that a constraint on an average 
quantity may be arbitrarily less restrictive than one on its maximum. A small 
difficult region (e.g. of very high curvature) may cause the bounds of the latter 
to substantially degrade, while the results presented here would not be adversely 
affected so long as the region is small. 

Additionally, care has been taken throughout to analyze the behavior of 
the constants. In particular, there are no constants in the analysis that grow 
exponentially with the dimension, and in fact, many have polynomial, or slower 
growth. We believe this to be an important point, since this ensures that the 
asymptotic bounds do not hide an additional exponential dependence on the 
dimension. 
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